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Abstract 

Bivariate  time  series  which  display  nonstationary  be¬ 
havior,  such  as  cycles  or  long  term  trends,  are  common 
in  fields  such  as  oceanography  and  meteorology.  These 
are  usually  very  large  scale  data  sets  and  often  may 
contain  long  gaps  of  missing  values  in  one  or  both  se¬ 
ries,  with  the  gaps  perhaps  occurring  at  different  time 
periods  in  the  two  series.  We  present  a  simplified  but 
effective  method  of  interactively  examining  and  filling 
in  the  missing  values  in  such  series  using  extensions  of 
the  methods  available  in  AGSS,  an  APL2-based  statis¬ 
tical  software  package.  Our  method  allows  for  possible 
detrending  and  removal  of  seasonal  components  before 
automatically  estimating  arbitrary  patterns  of  missing 
values  for  each  series.  Interactive  bivariate  spectral 
analysis  can  then  be  performed  on  the  detrended  and 
deseasonalized  interpolated  data  if  desired.  We  illus¬ 
trate  our  results  using  a  bivariate  time  series  of  ocean 
current  velocities  measured  off  the  California  coast. 


1  Introduction 

Gaps  of  missing  values  of  various  sizes  are  a  common 
problem  in  many  data  sets.  In  oceanographic  data, 
for  example,  a  single  large  gap  may  arise  in  the  gath¬ 
ering  of  tidal  data  when  an  instrument  stops  working 
and  the  malfunction  is  not  detected  for  several  days. 
Many  small  gaps  are  more  characteristic  of  data  gath¬ 
ered  from  satellites.  The  missing  value  problem  is  com¬ 
plicated  for  bivariate  series  in  that  the  gaps  may  not 
fail  at  the  same  time  periods  in  both  series.  Ad  hoc 
univariate  methods,  such  as  basing  “suitable”  replace¬ 
ment  values  on  the  range  of  values  assumed  by  neigh¬ 
boring  points  or  points  of  the  same  periodicity,  fail 
to  account  for  possible  cross  correlation  in  the  data. 
In  order  to  successfully  analyze  the  spectrum  of  gap¬ 
py  data  sets,  or  use  the  data  for  other  purposes,  the 
missing  values  need  to  be  estimated  in  a  way  that  is 
characteristic  of  the  rest  of  the  bivariate  data  set. 


The  problem  of  missing  values  in  time  series  has 
been  studied  by  several  authors  in  recent  years,  pri¬ 
marily  in  a  state  space  framework.  Jones  (1980) 
used  a  Kalman  filter  recursion  to  calculate  the  ex¬ 
act  likelihood  of  a  univariate  stationary  autoregressive 
moving  average  (ARMA)  process  with  missing  values, 
while  Harvey  and  Pierce  (1984)  and  Kohn  and  Ans- 
ley  (1986)  extended  the  Kalman  filtering  method  to 
nonstationary  autoregressive  integrated  moving  aver¬ 
age  (AIUMA)  processes.  Ansley  and  Kohn  (1985)  gave 
a  method  of  recursively  calculating  the  likelihood  for  a 
multivariate  state  space  model  with  incompletely  spec¬ 
ified  initial  conditions  which  can  be  used  to  interpolate 
an  arbitrary  pattern  of  missing  values  in  multivariate 
time  series.  Both  the  computation  and  derivation  are 
much  simpler  in  the  univariate  case  than  in  the  mul¬ 
tivariate  case.  More  recently,  Ljung  (1989)  derived  an 
exact  expression  for  the  estimates  of  missing  values  in 
a  univariate  ARIMA  process  in  a  form  that  is  useful 
for  examining  the  estimates  and  their  mean  squared  er¬ 
rors.  For  an  arbitrary  pattern  of  missing  values,  how¬ 
ever,  the  computations  axe  not  very  efficient.  None 
of  the  above  methods  is  thus  easy  to  implement  in 
practice  for  bivariate  series  which  are  possibly  nonsta¬ 
tionary  and  have  arbitrary  patterns  of  missing  values 
in  both  series. 

In  this  paper,  we  present  an  algorithm  for  semi- 
automatically  filling  in  gaps  in  bivariate  time  series, 
allowing  for  trends,  cycles,  and  cross  correlation  in  the 
data.  The  interactive  implementation  of  the  aigorith- 
m  allows  for  visual  examination  of  the  data  at  each 
step,  giving  the  practitioner  the  opportunity  to  view 
the  original  data  with  missing  values,  the  “patched” 
data  in  which  the  missing  values  have  been  filled  in 
using  linear  interpolation,  and  the  estimated  autospec¬ 
trum  of  the  crudely  interpolated  data.  After  this  ex¬ 
amination,  one  may  choose  to  remove  a  trend  or  cycles 
from  the  data.  The  remaining  series  is  automatically 
modeled  as  an  autoregressive  process  and  the  estimat¬ 
ed  model  is  used  for  interpolation.  The  method  allows 
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for  joinc  interpolation  of  two  correlated  aeries,  incor¬ 
porating  an  estimate  of  the  autocorrelation  for  each 
series,  and  the  cross  correlation  between  the  two  se¬ 
ries,  into  the  interpolated  values.  At  the  end  of  the 
interpolation  phase,  the  user  has  the  choice  of  exam¬ 
ining  the  coherence  function  of  the  interpolated  series, 
as  well  as  producing  more  detailed  plots  of  particular 
segments  of  the  series.  The  following  section  presents 
the  interpolation  algorithm  in  detail,  while  Section  3 
gives  an  application  of  the  algorithm  to  a  bivariate  se¬ 
ries  of  ocean  current  velocity  meter  readings  measured 
off  the  California  coast. 

2  Algorithm 

The  following  algorithm  has  been  coded  in  APL2 
using  the  IBM  APL2  AGSS  program  as  a  comput¬ 
ing  placform.  Thus  functions  such  as  regression,  the 
Fast  Fourier  Transform  used  for  computing  the  peri- 
odogram  of  the  series,  and  random  number  generation 
from  AGSS  are  used,  as  well  as  some  of  the  AGSS 
graphics  screens.  The  algorithm  is  available  in  a  AGSS 
library  from  the  authors  for  mainframe  or  microcom¬ 
puter  data.  The  algorithm  is  as  follows: 

1.  The  user  is  asked  to  enter  the  name  of  the  original 
series  containing  gaps  and  the  series  is  plotted. 
Denote  this  series  by  x(£),  t  =  1, 2, . . .  ,n.  If  there 
axe  two  series  containing  gaps,  and  the  two  series 
axe  cross  correlated  in  some  way,  the  user  is  asked 
to  enter  the  name  of  the  second  series  and  the 
second  series  is  plotted  as  well.  Denote  this  series 
by  y(t),  t  =  1, 2, . . . ,  n.  The  two  series  must  have 
the  same  length,  but  the  location  and  length  of 
gaps  in  the  series  may  differ. 

2.  The  user  is  asked  to  enter  the  number  used  on  the 
data  record  to  indicate  a  missing  value. 

3.  The  program  then  computes  the  locations  of  the 
gaps  and  fills  in  the  missing  values  for  each  series 
by  linearly  interpolating  between  the  two  points 
on  either  side  of  the  gap.  Denote  the  linearly  in¬ 
terpolated  series  as  X\(t)  and  yj(t)  respectively. 
The  resulting  series  are  then  plotted  and  can  be 
visually  examined  by  the  user  to  decide  whether 
removal  of  a  linear  trend  is  necessary. 

Note:  Steps  4-8  are  appli.d  to  both  the  X\{t)  and 
yi  (i)  series  in  exactly  the  same  manner.  Only  the 
results  for  the  x\(t)  series  are  given  below. 

4.  If  so  desired,  the  program  removes  a  linear  trend 
from  each  series.  The  trend  is  estimated  using 


least  squares  applied  to  the  initial  “patched”  se¬ 
ries.  The  resulting  series  is 

x 2(t)  =  *i(t)  —  a  —  bt, 

where  a  is  the  estimated  constant  and  b  is  the 
estimated  slope. 

5.  The  sample  periodogram  is  automatically  esti¬ 
mated  and  plotted  for  the  interpolated  and  de¬ 
trended  data,  xi(t).  (see,  for  example,  Priestly 
(1981)  for  a  definition  of  the  periodogram  and  its 
interpretation). 

6.  The  program  calculates  the  probability  of  obtain¬ 
ing  the  computed  values  for  the  20  largest  val¬ 
ues  of  the  norm  alined  periodogram  under  the  as¬ 
sumption  that  X2  (t)  is  a  Gaussian  white  nobe  pro¬ 
cess.  A  small  probability  indicates  that  a  cycle 
may  be  present  in  the  data.  The  probabilities 
are  computed  using  the  large  sample  test  statis¬ 
tic  for  Ij,j  =  1,2,  [n/2],  where  denotes  the 
jth  largesc  ordinate  of  the  periodogram.  (Priestly 
1981,  p.407). 

7.  Using  the  information  obtained  in  the  previous 
step,  as  well  as  any  intuitive  or  physical  knowledge 
of  cyclic  behavior  in  the  series,  the  user  specifies 
cycles  to  be  estimated  and  removed  from  the  in¬ 
terpolated  and  detrended  data,  if  desired.  The 
cycles  are  assumed  to  be  of  the  form 

J 

3t  -  y"{7jg03(w;t)  --  sin (>/,£)}, 

;=i 

where  u;;-  =  2t/;  are  the  frequencies  that  you 
would  like  to  remove  and  J  is  the  number  of  cy¬ 
cles.  The  7y  and  3,  are  estimated  using  least 
squares.  The  resulting  series  Is  xj(£)  =  xz(£)  —  it. 

3.  A  first  order  autoregressive  (AR)  model  is  fitted 
to  the  detrended  and  deseascr.alized  data  x3(t). 
.An  AR(1)  model  has  the  fern 

ij(t)  =  ©X3 (t  -  1)  -  o*(t),  t  =  2, 3, . . . ,  n. 

We  assume  that  o^c)  ~  iid.V(0,  cr\j.).  The  pa¬ 
rameter  3)  is  estimated  using  least  squares.  The 
residual  series  dx(t)  is 

d*(£)  -  x3(£)  -  $xz(t  -  1),  t  =  2,  3, ....  n. 

9.  The  variance  of  {dj(£)}  is  calculated  and  a  white 
noise  series  of  length  n  having  the  distribution 
.V(0,  is  generated.  Denote  this  series  by 
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10.  Let  l  denote  the  length  of  a  particular  gap  in  the 

series  and  let  i3(£)  and  r3(£  -4  l  -4  1)  denote  the 
points  on  either  side  of  the  gap.  The  program 
forecasts  and  backcasts  from  each  end  of  the  gap 
using  the  following  recursive  equations  for  j  = 
1,2 . 1. 

z3 ( £  4-  j)  =  <pXj(t  -rj  -  1)  4-  a!x(t  +  j ) 
X3(t  4  1  -4  1  —  j)  —  <hx3(t  4*  l  -4  2  —  j) 

-4  dj(t  -41-41—  j). 

Then  the  interpolated  value  is 
=3(4  +  j)  =  Wlj23(t  -4  j)  4-  W->jX3(t  -4  1  -4  1  -  j), 
where  =  1  —  (j/l  4- 1)  and  u>2j  —  1  —  w\j. 

11.  If  interpolating  values  for  two  correlated  series, 
the  standard  deviation  of  the  residual  series 
{^(t)}  and  {dy(t)}  found  in  Step  8  is  calculat¬ 
ed  and  the  sample  cross  correlation  at  lag  0  be¬ 
tween  {<!*(£)}  and  {dy(t)}  is  computed.  Denote 
this  by  c.  A  white  noise  series  of  length  n  having 
the  distribution  iV(0,  <7^  )  is  generated.  Denote 
this  series  by  a'y(t).  A  second  white  noise  series  of 
length  n  is  generated  using  the  following  relation: 

S (t)  =  4-  y/l  -c?ay(t). 

12.  The  values  for  the  y3(t)  series  are  interpolated  as 
in  Step  9,  with  a'#(t  -4  j)  replaced  by  a'y'(t  -4  j) 

13.  The  estimated  trend  and  cycle  are  added  back  to 
the  interpolated  series  and  the  series  containing 
the  final  estimates  of  the  missing  values  is  plotted. 

14.  The  user  may  choose  to  plot  the  coherence  func¬ 
tion  for  the  detrended  and  deseasonaiized  inter¬ 
polated  series  if  desired. 

15.  The  user  may  also  choose  to  plot  more  detailed 
segments  of  the  final  interpolated  series  if  desired. 

Using  a  weighted  average  of  backward  and  forward 
forecasts  made  from  each  end  of  a  gap  using  an  es¬ 
timated  univariate  ARMA  modei,  as  was  used  in  Step 
10,  was  discussed  by  Abraham  (1981).  He  calculates 
the  weights  to  minimize  the  mean-squared  error  of  the 
interpolated  value,  thus  the  weights  depend  in  a  com¬ 
plicated  way  on  both  the  estimated  modei  parameters 
and  the  length  of  gap.  Our  method,  although  sim¬ 
ple,  is  intuitively  appealing  in  that  it  gives  less  weight 
to  interpolated  values  at  long  lead  times  and  is  easy 
to  implement  when  the  lengths  of  the  gaps  are  dif¬ 
ferent.  .Voce  chac  in  Step  10,  we  include  a  simulated 


noise  term  in  the  usual  expressions  for  the  backward 
and  forward  forecasts  of  an  AR(1)  model.  This  is  to 
eliminate  unrealistic  “smoothness"  in  the  interpolated 
values,  which  will  occur  if  the  gap  is  very  long.  Real¬ 
istic  noise  in  the  series  is  important  if  the  interpolated 
series  will  be  used  to  estimate  the  spectral  density  of 
the  series.  Similarly,  in  Step  11  we  incorporate  an  esti¬ 
mate  of  the  contemporaneous  correlation  between  the 
two  series  into  the  estimates  of  the  interpolated  values 
of  the  second  series  by  including  a  noise  term  taken 
from  a  simulated  bivariate  Normal  pair  of  series  with 
correlation  c.  The  method  for  generating  a  bivariate 
Normal  pair  with  specified  correlation  is  taken  from 
Lewis  and  Orav  (1989,  p.301).  A  discussion  of  con¬ 
temporaneous  bivariate  time  series  models  made  be 
found  in  Camacho,  Hipel,  and  McLeod  (198T). 

3  An  Example 

We  apply  the  above  algorithm  to  a  vector  pair  of 
ocean  current  velocities  collected  off  Point  Sur,  Cal¬ 
ifornia  over  the  period  0000  hours,  Sept.  19,  1990 
to  2300  hours,  Oct.  30,  1990,  a  total  period  of  1008 
hours.  Current  velocities  are  just  one  set  of  variables 
which  are  collected  regularly  by  oceanographers  at  the 
Naval  Postgraduate  School  in  order  to  provide  infor¬ 
mation  related  to  the  long  term  variability  in  sea  sur¬ 
face  temperatures  off  the  California  coast.  The  veloci¬ 
ties  were  measured  using  a  paddlewheel  and  electronic 
counter  assembly  located  at  the  top  of  the  recording 
unit  placed  at  a  depth  of  350  meters.  Velocity,  in  units 
of  cm/s,  was  determined  from  the  number  of  revolu¬ 
tions  made  by  the  paddlewheel  during  each  sampling 
interval,  to  an  accuracy  of  ±1.0  cm/s.  The  data  was 
initially  recorded  at  30  minute  intervals.  After  initial 
visual  inspection  for  outliers  or  periods  suspect  of  in¬ 
strument  failures,  and  manual  editing  if  necessary,  the 
data  was  filtered  using  a  Cosine- Lanczos  filter  with 
a  centered  25  point  data  window  and  interpolated  to 
specified  50  minute  intervals.  At  this  point,  there  re¬ 
mained  a  period  of  63  consecutive  hours  of  missing  da¬ 
ta,  in  which  the  data  gathering  instruments  were  not 
working.  There  were  also  a  few  scattered  individuai 
mining  values. 

Figures  1  and  2  show  plots  of  the  E-W  (or  u)  com¬ 
ponent  of  the  current  velocity,  and  the  N-S  (or  v)  com¬ 
ponent  of  the  current,  with  missing  values  coded  as  0. 
Figures  3  and  4  depict  the  same  series  with  missing 
values  “patched”  using  linear  interpolation.  There  ap¬ 
pear  to  be  regular  cycles  in  the  data.,  as  expected  for 
current  data,  as  well  as  the  presence  of  a  long  term 
trend.  We  fit  a  linear  trend  to  both  the  u  and  v  cur- 
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rent  components,  with  estimated  constants  of  3.91  and 
1.60,  respectively,  and  estimated  slope  coefficients  of 
-0.0018  and  -0.0003.  Standard  £-tests  on  the  signifi¬ 
cance  of  the  regression  coefficients  are  not  appropriate 
because  the  residuals  are  not  assumed  to  be  uncor¬ 
related.  Figures  5  and  6  show  the  periodogram  for 
each  detrended  series.  Diurnal  and  semi-diurnal  cy¬ 
cles  are  clearly  indicated  for  the  v- component  of  cur¬ 
rent  velocity,  with  only  the  diurnal  cycle  clearly  seen 
in  the  u-component.  In  addition,  there  appears  to  be 
some  long  range  dependence  in  the  data,  as  evidenced 
by  the  large  values  of  the  periodogram  at  small  fre¬ 
quencies,  which  remain  even  after  the  removal  of  a 
long  term  trend.  See  Lewis  and  Ray  (1992)  for  a  dis¬ 
cussion  of  long  range  dependence  in  sea  surface  data. 
Based  on  the  approximate  p- values  of  the  test  statistic 
for  each  of  the  20  largest  periodogram  ordinates  and 
knowledge  of  the  tidal  cycles,  we  remove  cycles  at  fre¬ 
quencies  (in  cycles  per  1008  hours)  81,  82,  and  84  in 
the  u-component  and  frequencies  42,  81,  82,  and  84 
in  the  u-component.  Table  1  gives  the  values  of  the 
normalized  periodogram  values  at  these  ordinates  and 
the  resulting  p-values  for  the  test  statistic.  After  this 
step,  the  missing  values  are  automatically  estimated 
for  the  detrended  and  deseasonalized  data.  Figures  7 
and  3  show  the  two  series  with  final  estimates  of  the 
missing  values,  after  the  trend  and  cycles  have  been 
added  back  to  the  series.  The  estimated  values  appear 
to  follow  the  pattern  of  che  data  quite  nicely. 

We  study  the  correlation  between  the  two  series 
in  more  detail  by  looking  at  the  coherence  function 
for  the  two  interpolated,  detrended  and  deseasonalized 
series.  The  coherence  is  Initially  computed  assuming 
a  cosine  arch  window  of  length  7.  The  user  has  the 
option  of  changing  the  window  at  any  point  in  the 
coherence  analysis.  Figure  9  shows  the  cross-phase, 
cross-amplitude,  and  cross-coherence  functions  of  the 
two  series.  The  cross-amplitude  spectrum  shows  de¬ 
pendence  between  the  series  at  fairly  low  frequencies, 
but  the  (normalized)  coherence  measure  shows  that 
che  dependence  extends  to  high  frequencies  as  well. 
No  systematic  effect  can  be  seen  in  the  phase  spec¬ 
trum. 


Additionally,  the  enlarged  segment  in  Figure  10 
depicts  che  two  series  with  values  plotted  as  vertical 
lines  drawn  from  the  x-axis.  The  segment  partially 
includes  the  incerpoiaced  values  shown  in  Figures  7 
and  3.  No  apparent  difference  between  the  known  and 
the  interpolated  values  can  be  seen. 


4  Summary 

We  have  presented  a  simple  algorithm  which  permits 
interpolation  of  arbitrary  patterns  of  missing  values  in 
both  univariate  and  bivariate  time  series,  allowing  for 
the  possibility  of  non-stationarity.  The  implementa¬ 
tion  is  interactive  and  has  graphical  capabilities  avail¬ 
able  at  each  step.  It  is  also  much  easier  to  implement 
in  practice  than  the  state-space  approach  of  Ansley 
and  Kohn(1985),  which  requires  modified  versions  of 
the  Kalman  filter  and  the  fixed  point  smoothing  al¬ 
gorithm.  The  estimated  contemporaneous  correlation 
between  the  two  series  is  used  in  the  interpolation  al¬ 
gorithm  In  order  to  estimate  the  missing  values  in  a 
manner  that  is  consistent  with  the  rest  of  the  data. 
Although  we  have  assumed  that  each  series  follows  a 
simple  A-R(l)  model,  the  algorithm  could  easily  be  ex¬ 
tended  to  model  each  series  as  an  ARMAfp,  q)  model, 
with  the  orders  of  p  and  q  chosen  by  the  user  after 
examination  of  the  sample  autocorrelation  and  par¬ 
tial  autocorrelation  functions  of  the  detrended  and  de¬ 
seasonalized  “patched”  data.  Functions  necessary  to 
compute  the  sample  correlation  functions  and  estimate 
the  parameters  of  an  ARMA(p,  q)  model  axe  already 
present  in  the  IBM  APL2  AGSS  package.  (The  AGSS 
application  disdussed  here  is  available  from  the  au¬ 
thors;  the  AGSS  package  is  available  from  IBM.) 
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Tables  and  Figures 


Table  1:  Normalized  Periodogram  Values  for  Ocean 
Current  Velocities _ 


U-component  of  current  velocity 

|  Frequency 

Norm.  Periodogram  Value 

p-value  | 

1  1 

106.14 

0.00 

31 

88.52 

0.00 

i  9 

35.67 

0.00 

;  32 

22.42 

0.00 

34 

20.35 

0.00 

;  V-component  of  current  velocity 

Frequency 

Norm.  Periodogram  Value 

p-value 

31 

195.51 

0.00 

1 

69.88 

0.00 

84 

34.31 

0.00 

2 

20.90 

0.00 

32 

14.40 

0.00 

42 

10.09 

0.11 
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Figure  1:  Lf-componenc  of  Current  Velocity  (missing 
values  coded  a3  0’s) 
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Figure  2:  V-component  of  Current  Velocity  (missing 
values  coded  as  0’s) 
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PUT  Of  UCJRRENT  WITO  (NITtAL  ESTIMATES  Of  WSSNG  VAUtS 


ESTlMATEO  SPEOTPAL  OENSTT  PCS  JCURREVT 


Figure  3:  U-component  of  Current  Veiodty  (missing  Figures:  Periodogram  of  U-component  of  Current  Ve- 
values  linearly  interpolated)  locity  after  linear  detrending 
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Figure  4:  V-component  of  Current  Veiodty  (missing  Figures:  Periodogram  of  V-component  of  Current  Ve- 
vaiues  linearly  interpolated)  locity  after  linear  detrending 
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PLOT  OF  UCJRRENT  WITH  FINAL  ESTIMATES  OF  MISSING  VALLES 


Figure  7:  U-componenc  of  Current  Velocity  (missing 
values  estimated  using  complete  interpolation  proce¬ 
dure) 


Figure  9:  Cross-amplitude  Spectrum  (top),  Phase 
Spectrum  (middle),  and  Coherence  (bottom)  of 
U-component  and  V-component  of  Current  Velocity 
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Figure  S:  V-component  of  Current  Velocity  (missing 
values  estimated  using  the  complete  interpolation  pro¬ 
cedure) 


Figure  10:  Detailed  segment  of  U-component  and 
V-component  of  Current  Velocity  after  final  estima¬ 
tion  of  mining  values 
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